Dept. of Environmental Science and Policy, University of Milan
Dept. of Environmental Science and Policy, University of Milan
2025-04-02
Today we observe a pervasive presence of AI in tackling complex biochemical challenges, such as:
However, the pace at which such AI systems have been developed has outstripped the development of explainable models or validations of their robustness.
Protein pocket detection is a key problem in the context of drug discovery and design. It involves the identification of locations on a protein surface where small molecules (usually drugs) are likely to bind.
Protein pocket detection is a key problem in the context of drug discovery and design. It involves the identification of locations on a protein surface where small molecules (usually drugs) are likely to bind.
GENEOnet [1] is a specialized GENEO [2] network model designed for detecting protein pockets, it features a shallow architecture composed of a small number of GENEO units.
Notably, it is an explainable by design model, we compared its performances with other state-of-the-art methods finding that it has better results despite its greater simplicity.
GENEOnet can be counted in the S.A.F.E. ML/AI framework [3]
GENEOnet was developed with GENEOs (Group Equivariant Non-Expansive Operators) which are mathematical tools that can be combined into network models featuring:
Fix two spaces of real valued functions \(\Phi\), \(\Psi\) and two groups \(G\), \(H\) of transformations of their domains.
Definition 1: (GENEOs) A map \(F \colon \Phi \to \Psi\) is called a Group Equivariant Non-Expansive Operator if, fixed \(T\colon G\to H\), the followings hold:
\(F(\varphi \circ g) = F(\varphi) \circ T (g)\) for every \(\varphi \in \Phi\), \(g \in G\) (equivariance)
\(||F(\varphi) - F(\varphi')||_{\infty} \le ||\varphi - \varphi'||_{\infty}\) for every \(\varphi, \varphi' \in \Phi\) (non-expansivity)
Fix two spaces of real valued functions \(\Phi\), \(\Psi\) and two groups \(G\), \(H\) of transformations of their domains.
Definition 1: (GENEOs) A map \(F \colon \Phi \to \Psi\) is called a Group Equivariant Non-Expansive Operator if, fixed \(T\colon G\to H\), the followings hold:
\(F(\varphi \circ g) = F(\varphi) \circ T (g)\) for every \(\varphi \in \Phi\), \(g \in G\) (equivariance)
\(||F(\varphi) - F(\varphi')||_{\infty} \le ||\varphi - \varphi'||_{\infty}\) for every \(\varphi, \varphi' \in \Phi\) (non-expansivity)
Employ Molecular Dynamics (MD) simulations data to assess GENEOnet robustness to biologically relevant perturbations.
We retrieved MD simulations data from ATLAS [4] then:
To assess GENEOnet’s robustness [5] [6] we compared the distributions of Overlaps and RMSDs for the 37 proteins considered expecting that:
Boxplots of RMSD and Overlap
GENEOs can be used to develop explainable by design AI models which are also robust to perturbations in the data as shown with GENEOnet and MD simulations.
An extended version of this short work, featuring additional analysis and tests, has been recently published in Statistics.
SDS 2025 - Milan - 02/03 April 2025